A comparison of AUC estimators in small-sample studies

نویسندگان

  • Antti Airola
  • Tapio Pahikkala
  • Willem Waegeman
  • Bernard De Baets
  • Tapio Salakoski
چکیده

Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Small Area Estimation Methods for Estimating Unemployment Rate

Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta­ tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula­ tion census is a challenge for SCI in using these methods. In general, the...

متن کامل

Comparison of the Gamma kernel and the orthogonal series methods of density estimation

The standard kernel density estimator suffers from a boundary bias issue for probability density function of distributions on the positive real line. The Gamma kernel estimators and orthogonal series estimators are two alternatives which are free of boundary bias. In this paper, a simulation study is conducted to compare small-sample performance of the Gamma kernel estimators and the orthog...

متن کامل

Some New Developments in Small Area Estimation

Small area estimation has received a lot of attention in recent years due to growing demand for reliable small area statistics. Traditional area-specific estimators may not provide adequate precision because sample sizes in small areas are seldom large enough. This makes it necessary to employ indirect estimators based on linking models. Basic area level and unit level models have been extensiv...

متن کامل

Small Area Estimation of the Mean of Household\'s Income in Selected Provinces of Iran with Hierarchical Bayes Approach

Extended Abstract. Small area estimation has received a lot of attention in recent years due to necessity demand for reliable small area statistics. Direct estimator may not provide adequate precision, because sample size in small areas is seldom large enough. Hence, by employing models that can use auxiliary information and area effects in descriptions, one can increase the precision of direct...

متن کامل

Parametric Estimation in a Recurrent Competing Risks Model

A resource-efficient approach to making inferences about the distributional properties of the failure times in a competing risks setting is presented. Efficiency is gained by observing recurrences of the compet- ing risks over a random monitoring period. The resulting model is called the recurrent competing risks model (RCRM) and is coupled with two repair strategies whenever the system fails. ...

متن کامل

Nonparametric analysis of clustered data in diagnostic trials: Estimation problems in small sample sizes

In diagnostic trials, clustered data are obtained when several subunits (e.g., organs or vessels) of the same patient are observedwhere no, several, or all subunitsmay be diseased or non-diseased as classified by a gold standard. In such a design, repeatedmeasures appear in a natural way since the same patient is observed under different conditions by several readers and the repeated measures m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010